Covering Number as a Complexity Measure for POMDP Planning and Learning
نویسندگان
چکیده
Finding a meaningful way of characterizing the difficulty of partially observable Markov decision processes (POMDPs) is a core theoretical problem in POMDP research. State-space size is often used as a proxy for POMDP difficulty, but it is a weak metric at best. Existing work has shown that the covering number for the reachable belief space, which is a set of belief points that are reachable from the initial belief point, has interesting links with the complexity of POMDP planning, theoretically. In this paper, we present empirical evidence that the covering number for the reachable belief space (or just “covering number”, for brevity) is a far better complexity measure than the state-space size for both planning and learning POMDPs on several small-scale benchmark problems. We connect the covering number to the complexity of learning POMDPs by proposing a provably convergent learning algorithm for POMDPs without reset given knowledge of the covering number.
منابع مشابه
Covering Number for Efficient Heuristic-based POMDP Planning
The difficulty of POMDP planning depends on the size of the search space involved. Heuristics are often used to reduce the search space size and improve computational efficiency; however, there are few theoretical bounds on their effectiveness. In this paper, we use the covering number to characterize the size of the search space reachable under heuristics and connect the complexity of POMDP pl...
متن کاملWhat makes some POMDP problems easy to approximate?
Point-based algorithms have been surprisingly successful in computing approximately optimal solutions for partially observable Markov decision processes (POMDPs) in high dimensional belief spaces. In this work, we seek to understand the belief-space properties that allow some POMDP problems to be approximated efficiently and thus help to explain the point-based algorithms’ success often observe...
متن کاملSubmission Category: Reinforcement Learning, Preference: ORAL Approximate Planning in Large POMDPs via Reusable Trajectories
We consider the problem of reliably choosing a near-best strategy from a restricted class of strategies in a partially observable Markov decision process (POMDP). We assume we are given the ability to simulate the POMDP, and study what might be called the sample complexity — that is, the amount of data one must generate in the POMDP in order to choose a good strategy. We prove upper bounds on t...
متن کاملApproximate Planning in Large POMDPs via Reusable Trajectories
We consider the problem of reliably choosing a near-best strategy from a restricted class of strategies in a partially observable Markov decision process (POMDP). We assume we are given the ability to simulate the POMDP, and study what might be called the sample complexity — that is, the amount of data one must generate in the POMDP in order to choose a good strategy. We prove upper bounds on t...
متن کاملCovering Number: Analyses for Approximate Continuous-state POMDP Planning (Extended Abstract)
To date, many theoretical results on discrete POMDPs have not yet been extended to continuous-state POMDPs, due to the infinite dimensionality of the belief space in a continuousstate case. In this paper, we define a distance in the `nmetric space with respect to a partitioning representation of the continuous-state space, and formalize the size of the search space reachable under inadmissible ...
متن کامل